AITopics | magnitude distribution

Collaborating Authors

magnitude distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Activation Sparsity Opportunities for Compressing General Large Language Models

Dhar, Nobel, Deng, Bobin, Islam, Md Romyull, Nasif, Kazi Fahim Ahmad, Zhao, Liang, Suo, Kun

arXiv.org Artificial IntelligenceDec-12-2024

Deploying local AI models, such as Large Language Models (LLMs), to edge devices can substantially enhance devices' independent capabilities, alleviate the server's burden, and lower the response time. Owing to these tremendous potentials, many big tech companies have released several lightweight Small Language Models (SLMs) to bridge this gap. However, we still have huge motivations to deploy more powerful (LLMs) AI models on edge devices and enhance their smartness level. Unlike the conventional approaches for AI model compression, we investigate activation sparsity. The activation sparsity method is orthogonal and combinable with existing techniques to maximize compression rate while maintaining great accuracy. LLMs' Feed-Forward Network (FFN) components, which typically comprise a large proportion of parameters (around 3/2), ensure that our FFN optimizations would have a better chance of achieving effective compression. Moreover, our findings are beneficial to general LLMs and are not restricted to ReLU-based models. This work systematically investigates the tradeoff between enforcing activation sparsity and perplexity (accuracy) on state-of-the-art LLMs. Our empirical analysis demonstrates that we can obtain around 50% of main memory and computing reductions for critical FFN components with negligible accuracy degradation. This extra 50% sparsity does not naturally exist in the current LLMs, which require tuning LLMs' activation outputs by injecting zero-enforcing thresholds. To obtain the benefits of activation sparsity, we provide a guideline for the system architect for LLM prediction and prefetching. The success prediction allows the system to prefetch the necessary weights while omitting the inactive ones and their successors, therefore lowering cache and memory pollution and reducing LLM execution time on resource-constrained edge devices.

large language model, machine learning, sparsity, (19 more...)

arXiv.org Artificial Intelligence

2412.12178

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Palomar twilight survey of 'Ayl\'o'chaxnim, Atiras, and comets

Bolin, B. T., Masci, F. J., Coughlin, M. W., Duev, D. A., Ivezić, Ž., Jones, R. L., Yoachim, P., Ahumada, T., Bhalerao, V., Choudhary, H., Contreras, C., Cheng, Y. -C., Copperwheat, C. M., Deshmukh, K., Fremling, C., Granvik, M., Hardegree-Ullman, K. K., Ho, A. Y. Q., Jedicke, R., Kasliwal, M., Kumar, H., Lin, Z. -Y., Mahabal, A., Monson, A., Neill, J. D., Nesvorný, D., Perley, D. A., Purdum, J. N., Quimby, R., Serabyn, E., Sharma, K., Swain, V.

arXiv.org Artificial IntelligenceSep-23-2024

Near-sun sky twilight observations allow for the detection of asteroid interior to the orbit of Venus (Aylos), the Earth (Atiras), and comets. We present the results of observations with the Palomar 48-inch telescope (P48)/Zwicky Transient Facility (ZTF) camera in 30 s r-band exposures taken during evening astronomical twilight from 2019 Sep 20 to 2022 March 7 and during morning astronomical twilight sky from 2019 Sep 21 to 2022 Sep 29. More than 46,000 exposures were taken in evening and morning astronomical twilight within 31 to 66 degrees from the Sun with an r-band limiting magnitude between 18.1 and 20.9. The twilight pointings show a slight seasonal dependence in limiting magnitude and ability to point closer towards the Sun, with limiting magnitude slightly improving during summer. In total, the one Aylo, (594913) 'Ayl\'o'chaxnim, and 4 Atiras, 2020 OV1, 2021 BS1, 2021 PB2, and 2021 VR3, were discovered in evening and morning twilight observations. Additional twilight survey discoveries also include 6 long-period comets: C/2020 T2, C/2020 V2, C/2021 D2, C/2021 E3, C/2022 E3, and C/2022 P3, and two short-period comets: P/2021 N1 and P/2022 P2 using deep learning comet detection pipelines. The P48/ZTF twilight survey also recovered 11 known Atiras, one Aylo, three short-period comes, two long-period comets, and one interstellar object. Lastly, the Vera Rubin Observatory will conduct a twilight survey starting in its first year of operations and will cover the sky within 45 degrees of the Sun. Twilight surveys such as those by ZTF and future surveys will provide opportunities for discovering asteroids inside the orbits of Earth and Venus.

angular distance, recovery, survey field, (13 more...)

arXiv.org Artificial Intelligence

2409.15263

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(24 more...)

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

Unlocking Continual Learning Abilities in Language Models

Du, Wenyu, Cheng, Shuang, Luo, Tongxu, Qiu, Zihan, Huang, Zeyu, Cheung, Ka Chun, Cheng, Reynold, Fu, Jie

arXiv.org Artificial IntelligenceJun-24-2024

Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGU}{this https URL}.

benchmark, language model, linear layer, (14 more...)

arXiv.org Artificial Intelligence

2406.17245

Country:

North America > United States > Oregon (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Generalized earthquake frequency–magnitude distribution described by asymmetric Laplace mixture modelling

#artificialintelligenceOct-1-2019, 06:38:53 GMT

The complete part of the earthquake frequency–magnitude distribution, above the completeness magnitude mc, is well described by the Gutenberg–Richter law. On the other hand, incomplete data does not follow any specific law, since the shape of the frequency–magnitude distribution below max(mc) is function of mc heterogeneities that depend on the seismic network spatiotemporal configuration. This paper attempts to solve this problem by presenting an asymmetric Laplace mixture model, defined as the weighted sum of Laplace (or double exponential) distribution components of constant mc, where the inverse scale parameter of the exponential function is the detection parameter κ below mc, and the Gutenberg–Richter β-value above mc. Using a variant of the Expectation-Maximization algorithm, the mixture model confirms the ontology proposed by Mignan [2012, https://doi.org/10.1029/2012JB009347], The performance of the proposed mixture model is analysed, with encouraging results obtained in simulations and in eight real earthquake catalogues that represent different seismic network spatial configurations.

asymmetric laplace mixture, magnitude distribution, mixture model, (4 more...)

#artificialintelligence

AI-Alerts: 2019 > 2019-10 > AAAI AI-Alert for Oct 1, 2019 (1.00)

Technology: Information Technology > Artificial Intelligence (0.60)

Add feedback